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1. INTRODUCTION 

The game industry is facing a surge of data which results from increasingly available highly detailed 
information about the behavior of software and users. Teenagers enjoy playing online games which are 
challenging but fair to all players. Before online games become popular, gamers usually play a single player 
game such as Mario and Hugo. After some time the games evolve and become massive multiplayer online. 
Massive Multiplayer Online (MMO) allows gamers to interact and play with other players up to thousand 
players. Gamers can stay at home and play with their friends as long they have an internet connection. The 
rise of online games become more popular when many platforms which provides various collection of online 
games and allows gamers to communicate with each other and create a community to discuss based the 
selected games such as STEAM, GARENA and ORIGIN. Game’s genre like strategy and first-person shooter 
are the most well-known genres among gamers. 

Data mining has been used to improve the strategy of the game [1]. One of the main challenges for 
game data mining is to predict strategies to make the game become more challenging and fair to all player. 
Formassive multiplayer game such as Rainbow Six Siege (RSS), data mining techniquescan be use to predict 
the chosen operator for the user to counter enemy operator ability. RSS comes with multiple operators which 
have an individual ability to counter in the games thus, it is best to predict which operator is suitable to use 
for a user to challenge the opponent operator’s ability. A large number of replays were collected from live 
streams from an international tournament. Each replay is divided into many rounds and is labelled based on 
player chosen operator and class. 
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This paper tests the accuracy of three classification algorithms to predict the output of massive 
multiplayer online game. The remainder of this paper is organized as follows. Section 2 discusses past studies 
of MMO and data mining in games. Section 3 expresses the methods involved in this research. The 
performance measurements used are discussed in Section 4. Section 5 presents the results, and the conclusion 
and future works are provided in Section 6. 


2. BACKGROUND OF THE STUDY 

This research focuses on the operator prediction based on the opponent operator that will be used by 
the user in RSS. There are several approaches in recognizing the pattern in operator selection. This section 
discusses theoverview of massive multiplayer online gameand previous study on games. 


2.1. Massive Multiplayer Online Game (MMOG) 

An online game is a game controlled by the computer where the player interacts using computer 
network [2]. Massive multiplayer online games (MMOG) have captured attention in the game industry since 
millions of people participate in the battle using the same server [3]. Table 1 shows the various MMOGs 
available on the internet. MMOG represent complex data that can be divided into sub problem thus it is 
necessary to do data mining in this game to study the pattern or strategy of the game to increase winning ratio 
of the game play. There are several papers regarding game data mining. Analyzing behavior data from 
massive multiplayer game data can be challenging since it involves huge numbers of active players across the 
world. This research paper suggested data mining to explore data and discover patterns that can reduce 
overall complexity of the data [4]. 

Data mining algorithm was used to identify the difference between human and AI botsin [9]. In the 
study, 142 replays were collected and trained using machine learning. The result shows that it is possible to 
increase the win ratio of the game if the bots able to recognize the opponent’s strategy and successful build 
the order. 

Several approaches have been used to model the opponent [10, 11, 18]. The approaches were 
applied on the Real Time Strategy game to predict the strategy of the opponents so that player can respond 
accordingly to the game. The predictions had applied KNN, decision tree and logistic regression classifiers to 
predict the accuracy of the strategy [10]. A Bayesian model was used to predict the result or outcomes of 
isolates battle and also predict what unit will be used to defeat a given army [11]. The goal of the opponent 
model approach is to learn and select which combination of unit is more effective against others. 


Table 1. Massive Multiplayer Online Game 








Title Graphics Setting Subscription Model Release Date 
Star Wars Galaxies 3D Science Fiction Pay to Play 2003 
World of Warcraft 3D Fantasy Pay to Play 2004 
Rainbow Six Siege (RSS) 3D Science Fiction Buy to Play 2015 
Tibia 2D Fantasy Free to Play 1997 
Xyson 3D Apocalyptic Fantasy Buy to Play 2011 





Rainbow Six Siege (RSS) has been one of the successful MMOG with huge number of players over 
the world. Many websites provide the gameplay that stores all the gaming events and to replay the game of 
good players from that. As a result, the RSS community has accumulated their experience in the form of 
replays for several years. Recently, RSS game has gained a lot of interest in the gaming industry. Beside 
Dota2 and Counter Strike, RSS placed third for the most popular game in the world. Figure | shows the most 
popular MMOG in the world in December 2017. 
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Current Peak Today Game 
Players 

307,644 719,771 Dota 2 
180,592 515,266 Counter- 


Strike 
Global 
Offensive 


49,879 70,620 Tom 
Clancy's 

Rainbow Six 

Siege 

41,985 50,534 Team 


Fortress 2 


Figure 1. Example of Massive Multiplayer Game Online 


2.2. Overview of Rainbow Six Siege 

Rainbow Six Siege (RSS) is one of the popular First-Person Shooter with real-time strategy which 
uses current technology and is divided into two teams which is attack and defend. Players can choose 38 
operator which is 18 per team with different abilities to win the games. The operator will be classified into 4 
different classes in attack and defend. There are 10 players connected either random or in a party which is 5 
persons per team. Both teams have different task at the early stage of the game (first 30 seconds) which on 
attack they need to find the location of the objective and identify the operator taken by the defend using 
drone. On the defend side they need to protect the objective from attacker thus they need to barricade or 
reinforce the wall in order to slowdown or prevent the attacker to secure the objective. There are several 
gameplay modes which is plant the bomb, rescue hostage and secure the area with seventeen different maps. 
RSS makes the team win by killing all enemy and secure the objective for attackers while on defend either 
kill all enemy or defend the objective within a time limit. A study has been conducted to predict opponent 
positioning using opponent modeling in first person shooter [5, 6]. This research focuses on one map which 
is presidential plane since the map is popular and have many replays on the web. 


3. METHODOLOGY 

Methodology is the specific steps taken during the project investigation Figure 2 shows the phases 
involved in this research. It starts with dataset development, extraction of features, classifications by three 
classifiers Naive bayes, IBK and J48. It is then followed by results and evaluation. 






Dataset Development 
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Figure 2. Research Flowchart 
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3.1. Dataset Development 

In the gaming industry, several websites are dedicated to collect and sharing RSS replays. These 
websites allow sharing a large portion of replays of game play from professional and high-ranked amateur 
matches. Therefore, using all these data collected from those websites, it is possible to mine these websites in 
order to build a collection of all possibilities of strategies that user can use to win the game play in RSS. Due 
to a large number of replays available, it is possible to learn a variety of operator ability and also the 
strategies on several maps against different play styles to help improve user skills in order to win the game. 
The data can help improve online matchmaking, modelling player behavior and predicting strategy [10-12]. 
The data of the replays were collected from www.reddit.com and http://steamcommunity.com, two popular 
RSSwebsite. The collections of the replays from the professional tournament were downloaded from these 
websites. The scope of this research is limited to one map only. Table 2 shows the list of operators based on 
class in the RSS. 


Table 2. List of RSS Operators based on Class 








Attack Defender 
Class Operator Class Operator 

Breacher Ash, Thermite, Hibana, Sledge, Trap Frost, Kapkan, Ela, Lesion 
Zofia 

Shield Montage, Blitz Core Defense Castle, Mute, Rook, Doc, 

Bandit, Jaeger 

Damage Glaz, Fuze, Black beard, Damage Vigil, Tachankan, Caveira, 
Capitao, Buck Smoke 

Utility Twitch, Ying, Thatcher, IQ, Information Valkyrie, Mira, Echo, 
Dokkaebi, Jackal Pulse 





The remainder of this section will describe data mining approach to automatically learn strategies 
from collection of replays and subsequently will predict win or lose of the RSS game. 


3.2. Feature Extraction 

RSS replays are collected based on secure area game modes and presidential plane map. A dataset 
containing 32 games are created based on the operator’s class taken and result of the operator ability counter. 
Table 3 represents the characteristic of operator’s class for the attacker and the defender team in RSS where 
the info is extracted from rainbowsixsiege.fandom. 


Table 3. Characteristic of Operator Class 








Class Armor Movement Flexibility 
(Low, Medium, High) (Speed 1,2,3) (Low, Medium, High) 
Breacher Low 2 Medium 
Shield High 1 Medium 
Damage (Attack) Low 3 Medium 
Utility High 2 High 
Trap Medium 2 High 
Core High 1 Low 
Damage (Defend) Low 3 High 
Information Medium 3 High 





The goal of this research is to build a general model of RSS that can predict the outcome of the 
game play based on the operator selection. This differs from the previous study which focused on player’s 
action. By applying data mining on game replays, the algorithm can develop a set of rules that can predict the 
result of the game play hence the user can learn the operator’s ability through the algorithm suggested then 
they will improve the strategy to win the game. Table 4 shows the data used in this study based on the 
matchmaking that had been recorded from the replay. Every attack operator will be matched on every defend 
operator and the result of win and loss will based on the replay. 
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Table 4. Characteristics of selected RSS class operator 








Class Class Armor Movement Flexibility Result 

1 2 Class 1 Class 2 Class 1 Class 2 Class 1 Class 2 
Breacher Trap Low Medium Speed 2 Speed 2 Medium High Lose 
Damage Trap Low Medium Speed 3 Speed 2 Medium High Lose 
Trap Utility Medium High Speed 2 Speed 3 High High Lose 
Core Defense Breacher High Low Speed 1 Speed 2 Low Medium Lose 





3.3. Classification Algorithms 
Three classifiers are used in this study. There are Naive Bayes, (K-Nearest Neighbor), IBK and 
decision tree (J48). 


3.3.1. Naive Bayes 


Naive Bayes is used to predict the class depending on the probability of belonging to the class [13]. 
The probability is calculated based on Equation 1. 


Pein = ee (1) 


where P (clx) is the posterior probability of class (target) given predictor (attribute), P(c) is the prior 
probability of class. P (xlc) is the likelihood which is the probability of predictor given class and P(x) is the 
prior probability of predictor. 


3.3.2. IBK (k-Nearest Neighbor) 

Lazy learning is a learning method in which the system tries to generalize the training data before 
receiving queries is delayed until a query is made to the system [14]. Lazy can deal with changes of problem 
area and solve multiple problem [15]. IBk classifier is like k-Nearest Neighbor (KNN) classifier where by k 
is a used defined constant [19]. The similarities of different instances are calculated using Euclidean distance 
as in Equation 2. 





d; ; = ADS (Vi -V;)° (2) 


where Ve is the vector of the i-th instance, while V, is the s-th elements of vector Y. , and dj is the distance 


between V, and V,. 


3.3.3. J48 (Decision Tree) 

The J48 classifier (Weka [17] implementation of the well-known C4.5 decision tree algorithm) will 
generate some rules in order to predict the output variable and also help in describing the critical 
contributions to become easily understandable [16]. The additional features of J48 wereused to find missing 
values, decision trees pruning, continuous attribute value ranges and derivation of rules. 


4. PERFORMANCE MEASUREMENT 

The results of the classifiers are evaluated by using accuracy, true positive (TP) rate, precision, F- 
measure and mean absolute error (MAE). Accuracy assesses the overall effectiveness of the algorithm. It is 
given by Equation 3. 


Accuracy = ieee *100 (3) 


TP+FP+FN+TN 





where TP (true positive) and TN (true negative) are the numbers of correctly predicted positive and 
negative samples respectively. FP (false positive) and FN (false negative) are the numbers of incorrectly 
predicted positive and negative samples, respectively. 
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The TP Rate determines the play case ratio for predicted correctly cases to the total of positive 
cases. It is a probability corrected measure of agreement between the classifications and the true classes. In 
this TP evaluation, TP rate will determine number of examples predicted positive that are actually positive 
for the result of the game. 

Precision define as positive predictive value that will calculate how many positive predictions are 
correct for the result of the game. Equation 4 shows the formula to calculate precision. 


Precision= TP/(TP+FP) (4) 


where 7P refer to true positive and FP refer to false positive. F-Measure is a combination of recall and 
precision which is in a single performance. Equation 5 shows the formula to calculate F-measure. 


F-Measure= 2 * Precision * Recall / (Precision +Recall) (5) 


The mean absolute error (MAE) is used to measure how far the predictions differ from the actual values [20]. 
The formula for MAE is given in Equation 6. 





hfe 





MAE = & y (6) 
Nn jz 


where n=the number of errors and |f; — y;| = the absolute errors. 


5. RESULTS AND EVALUATION 

This section focuses on the evaluation of the selected classification algorithm. Table 5 shows the 
accuracy for every classification algorithms, Naive Bayes, IBK and J48. The results are obtained by using 5- 
fold cross validation. These measurements are obtained by using Weka [17] toolkit. 


Table 5. Classification algorithm and overall accuracy 








Classifier Accuracy (%) 
Naive Bayes 0.75 
IBK 0.94 
J48 0.66 





All the classifier predicts the outcome of the game according to the operator class selected. From 
Table 5, it can be concluded that IBK algorithm has the highest accuracy compared to Naive Bayes and J48 
which reaches accuracy of 93.75%. While Naive Bayes reaches 75% of accuracy and J48 have the lowest 
accuracy value 65%. Figure 3 shows the performance evaluation of Naive Bayes, IBK and J48 classification 
algorithms. It compares three types of performance evaluation which are True Positive (TP) rate, precision 
and F-Measure. 


Performance Evaluation of Classification Algorithm 


8 
ut) L 
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Correctly Classified Incorrectly 
Instances Classified Instances 


mNaive Bayes mIBK mJ48 
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TP Rate sion 


3888 


88s 8s 6&8 


Figure 3. Performance measurent for classification algorithm 
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From Figure 4, it is observed that J48 algorithm obtain the highest error rate while IBK obtain the 
lowest error rate. Therefore, IBK classification algorithm perform well since it has the highest accuracy rate. 


Mean Absolute Error (MAE) 


40 35,49 
27,62 

30 
a 15,27 
< 20 t 
= 

10 

0 
Naive Bayes IBK J48 
Classfier 


Figure 4. Error rate for classification algorithm 


6. CONCLUSION AND FUTURE WORKS 

This paper determines the best algorithm by comparing three classification techniques to predict the 
output of the RSS. This study mainly focus on three classification techniques, Naive Bayes, J48 and IBK. 
Experimentshave been conducted to determine the performace for each of the algorithm. Among the three 
classification algorithms, it can be concluded that IBK is the best algorithm to predict the outcome of RSS 
game compared to Naive Bayes and J48. Therefore, IBK model is suitable to be applied for prediction of 
MMO games since IBK algorithm perform better than Naive Bayes and J48. As for future works, the author 
plans to investigate the effect of feature selection to the game outcome prediction. Furthermore, using other 
classifiers should also be used in future experiments. 
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